SVR vs MLP for Phone Duration Modelling in HMM-based Speech Synthesis
نویسندگان
چکیده
In this paper we investigate external phone duration models (PDMs) for improving the quality of synthetic speech in hidden Markov model (HMM)-based speech synthesis. Support Vector Regression (SVR) and Multilayer Perceptron (MLP) were used for this task. SVR and MLP PDMs were compared with the explicit duration modelling of hidden semi-Markov models (HSMMs). Experiments done on an American English database showed the SVR outperforming the MLP and HSMM duration modelling on objective and subjective evaluation. In the objective test, SVR managed to outperform MLP and HSMM models achieving 15.3% and 25.09% relative improvement in terms of root mean square error (RMSE) respectively. Moreover, in the subjective evaluation test, on synthesized speech, the SVR model was preferred over the MLP and HSMM models, achieving a preference score of 35.93% and 56.30%, respectively.
منابع مشابه
Explicit duration modelling in HMM-based speech synthesis using a hybrid hidden Markov model-multilayer perceptron
In HMM-based speech synthesis, it is important to correctly model duration because it has a significant effect on the perceptual quality of speech, such as rhythm. For this reason, hidden semi-Markov model (HSMM) is commonly used to explicitly model duration instead of using the implicit state duration model of HMM through its transition probabilities. The cost of using HSMM to improve duration...
متن کاملDevelopment of a statistical parametric synthesis system for operatic singing in German
In this paper we describe the development of a Hidden Markov Model (HMM) based synthesis system for operatic singing in German, which is an extension of the HMM-based synthesis system for popular songs in Japanese and English called “Sinsy”. The implementation of this system consists of German text analysis, lexicon and Letter-To-Sound (LTS) conversion, and syllable duplication, which enables u...
متن کاملEmpirical comparison of two multilayer perceptron-based keyword speech recognition algorithms
In this paper, an empirical comparison of two multilayer perceptron (MLP)-based techniques for keyword speech recognition (wordspotting) is described. The techniques are the predictive neural model (PNM)-based wordspotting, in which the MLP is applied as a speech pattern predictor to compute a local distance between the acoustic vector and the phone model, and the hybrid HMM/MLP-based wordspott...
متن کاملA style control technique for speech synthesis using multiple regression HSMM
This paper presents a technique for controlling intuitively the degree or intensity of speaking styles and emotional expressions of synthetic speech. The conventional style control technique based on multiple regression HMM (MRHMM) has a problem that it is difficult to control phone duration of synthetic speech because HMM has no explicit parameter which models phone duration appropriately. To ...
متن کاملCross-lingual portability of MLP-based tandem features - a case study for English and Hungarian
One promising approach for building ASR systems for lessresourced languages is cross-lingual adaptation. Tandem ASR is particularly well suited to such adaptation, as it includes two cascaded modelling steps: feature extraction using multi-layer perceptrons (MLPs), followed by modelling using a standard HMM. The language-specific tuning can be performed by adjusting the HMM only, leaving the ML...
متن کامل